Predictive test for prognosis of myelodysplastic syndrome patients using mass spectrometry of blood-based sample

ABSTRACT

A method of predicting whether an MDS patient has a good or poor prognosis uses a general purpose computer configured as a classifier and mass-spectrometry data obtained from a blood-based sample. The classifier assigns a classification label of either Early or Late (or the equivalent) to the patient&#39;s sample. Patients classified as Early are predicted to have a poor prognosis or worse survival whereas those patients classified as Late are predicted to have a relatively better prognosis and longer survival time. The groupings demonstrated a large effect size between groups in Kaplan-Meier analysis of survival. Most importantly, while the classifications generated were correlated with other prognostic factors, such as IPSS score and genetic category, multivariate and subgroup analysis showed that they had significant independent prognostic power complementary to the existing prognostic factors.

PRIORITY

This application is a continuation of U.S. Ser. No. 15/899,866, filed onFeb. 20, 2018, which is a divisional application of U.S. Ser. No.14/946,045, filed on Nov. 19, 2015, which claims priority benefits under35 U.S.C. § 119 to U.S. Provisional application Ser. No. 62/086,807filed Dec. 3, 2014, all of which are incorporated by reference herein.

BACKGROUND

Myelodysplastic syndrome (MDS) comprises a heterogeneous group ofmyeloid hemopathies, characterized by ineffective bone marrow productionof the myeloid class of blood cells. While the incidence of MDS in thegeneral population is only about 4.5 per 100,000 people, the incidenceof MDS rises steeply with age, such that the incidence rate exceeds 50per 100,000 people over 80 years of age. MDS can arise de novo or occurafter chemotherapy or radiation for prior malignancies. The majorclinical problems facing patients with MDS are morbidities due tocytopenias (low blood counts) and the progression of MDS into acutemyeloid leukemia (AML). National Comprehensive Cancer Network (NCCN)Guidelines, Version 1 Myelodysplastic Syndromes (2015).

MDS can be subdivided into several categories. Until recently, theFrench-American-British (FAB) categorization was used. This schemadivided the syndrome into the five categories outlined in table 1. Thiscategorization has now been superseded by the World Health Organization(WHO) classification, also described in Table 1.

TABLE 1 French-American-British and WHO categories of MDS FAB WHODescription Refractory Refractory Blasts < 5% in bone marrow, anemia(RA) anemia (RA) median survival from 2-5 years, progression to AMLrare, 20%-30% of MDS patients. Refractory Blasts < 5% in bone marrow,cytopenia with multiple or single cytopenia multilineage or present,median survival 33 unilineage months, 11% progression to dysplasia. AML,15% of MDS patients. Refractory Refractory anemia Ring sideroblasts >15% of anemia with with ring marrow red cell precursors, ringsideroblasts sideroblasts 1%-2% progress to AML, (RARS) (RARS) prognosisas for RA, 10%-12% of MDS patients. Refractory Refractory anemia RAEB-1:5%-9% blasts in bone anemia with with excess blasts marrow and < 5%blasts in excess blasts −1 and −2 blood, median survival 18 (RAEB)months, 25% of patients progress to AML. RAEB-2: 10%-19% blasts in bonemarrow, median survival 10 months, 33% of patients progress to AML.RAEB-1+RAEB-2 accounts for 40% of MDS patients. MDS, unclassifiable MDSassociated Blasts in blood < 5%, associated with del(5q) with isolateddel(5q) cytogenetic abnormality. Long survival, Refractory Reclassifiedfrom Multiple cytopenias, 5%-19% anemia with MDS to AML blasts in blood,20%-30% excess blasts in following MDS blasts in bone marrow.transformation (RAEB-t) Chronic Reclassified from <20% blasts in bonemarrow myelomonocytic MDS to and > 10⁹ monocytes/L in blood leukemiamyelodysplastic (CMML) and myeloprofilerative diseases

Several prognostic scoring systems have been developed to help tostratify MDS for appropriate treatment regimens. These include theInternational Prognostic Scoring System (IPSS), the revised IPSS(IPSS-R), and the WHO-based Prognostic Scoring System (WPSS). Theoriginal IPSS score ranges from 0 (low risk) to over 2.5 (high risk) andis based on the percentage of blasts in marrow, the karyotype andpresence and multiplicity of cytopenias. This scoring system provedinadequate for accurate patient stratification, resulting in thedevelopment of the IPSS-R (ranging from 0 (very good prognosis) to over4 (very poor prognosis)), which uses a combination of cytogeneticstatus, percentage of marrow blasts, hemoglobin level, platelet level,and ANC level, and the WPSS (ranging from 0 (very low risk) to 6 (veryhigh risk)), which combines WHO MDS category, karyotype, and presence ofsevere anemia.

In addition to these commonly used prognostic scores, there have beensome investigations into blood-based biomarkers that might be able toimprove patient stratification. One particular study, using the samesamples that were made available to us for this project, studied serumCD44 (sCD44) and showed that it had potential to add information to theexisting IPSS scoring system.

Loeffler-Ragg J. et al., Serum CD44 levels predict survival in patientswith low-risk myelodysplastic syndromes Crit. Rev. Oncol. Hematol. vol.78 no. 2 pp. 150-61 (2011) In addition, attempts have been made torefine risk stratification or identify patients most likely to developsecondary acute myelogenous leukemia using circulating microRNAs (seeZuo Z., et al., Circulating microRNAs let-7a and miR-16 predictprogression-free survival and overall survival in patients withmyelodysplastic syndrome, Blood vol. 118 no. 2 pp. 413-5 (2011)) or geneexpression profile by microarrays (see Mills K. I., et al.,Microarray-based classifiers and prognosis models identify subgroupswith distinct clinical outcomes and high risk of AML transformation ofmyelodysplastic syndrome Blood vol. 114 no. 5 pp. 1063-72 (2009)). Thesepotential biomarkers remain to be validated in independent datasets.

Other prior art of interest includes Garcia-Manero, G., Myelodysplasticsyndromes: 2014 update on diagnosis, risk stratification, andmanagement, Am. J. Hemat., Vol. 89. No. 1, pp. 98-18 Jan. 2014); Bejar,R., Prognostic models in myelodisplatic syndromes, Am. Soc. OfHematology, p. 504-510 (2013); Adés, L. et al., MyelodysplasticSyndromes, The Lancet, vol. 383 pp. 2239-52 (Jun. 26, 2014); andWesters, T. M., et al., Aberrant immunophenotype of blasts inmyelodysplastic syndromes is a clinically relevant biomarker inpredicting response to growth factor treatment Blood vol. 115 pp.1779-1784 (2010).

Treatment for MDS patients is determined by risk category. High and someintermediate risk patients, if considered candidates for intensivetherapy, will receive hematopoietic stem cell transplants (HSCTs) orhigh intensity chemotherapy. Patients in this risk group unsuitable forintensive therapy receive azacitidine or decitabine and/or supportivecare. Treatment for these high-risk, very poor prognosis patients isrelatively well determined and there is little unmet need for additionaltests in this patient subgroup. On the other hand, treatmentalternatives for low risk and other intermediate risk patients are morevaried, including lenalidomide, immunosuppressive therapy, or possiblyazacitidine or supportive therapy, and here better tools for patientstratification would be beneficial. In particular it would be clinicallyuseful to improve on the prognostic scoring systems to determine whichpatients might have significantly worse than average prognosis, as suchsystems can guide treatment of MDS patients. This invention meets thatneed.

SUMMARY

In a first aspect, a method for predicting prognosis of amyelodysplastic syndrome (MDS) patient is disclosed. The method includesa step of performing MALDI-TOF mass spectrometry on a blood-based sampleobtained from the MDS patient by subjecting the sample to at least100,000 laser shots and acquiring mass spectral data. This step canpreferably make use of the so-called “deep MALDI” mass spectrometrytechnique described in U.S. patent application of H. Rőder et al., Ser.No. 13/836,436 filed Mar. 15, 2013, U.S. patent application publicationno. 2013/0320203, assigned to the assignee of this invention, thecontents of which are incorporated by reference herein, includingautomatic raster scanning of a spot on a MALDI plate and summation ofspectra from multiple spots. The method includes a step of obtainingintegrated intensity values in the mass spectral data of a multitude ofpre-determined mass-spectral features, such as 50, 100, or all of thefeatures listed in Appendix A. The method further includes the step ofoperating on the mass spectral data with a programmed computerimplementing a classifier. The operating step compares the integratedintensity values with feature values of a reference set of class-labeledmass spectral data obtained from a multitude of MDS patients with aclassification algorithm and generates a class label for the sample,wherein the class label is associated with a prognosis of the MDSpatient.

In a preferred embodiment, the classifier is configured as a combinationof filtered mini-classifiers using a regularized combination methodusing the techniques described below and in the pending U.S. patentapplication of H. Rőder et al., Ser. No. 14/486,442 filed Sep. 15, 2014,U.S. patent application publication no. 2015/0102216 assigned to theassignee of this invention, the content of which is incorporated byreference herein.

In one embodiment, the obtaining step obtains integrated intensityvalues of at least 50 features listed in Appendix A, at least 100features listed in Appendix A, or alternatively at least 300 featureslisted in Appendix A, such as all 318 features.

The classifier assigns a classification label of either Early or Late(or the equivalent) to the patient's sample. Patients classified asEarly are predicted to have a poor prognosis or worse survival whereasthose patients classified as Late are predicted to have a relativelybetter prognosis and longer survival time. The groupings demonstrated alarge effect size between groups in Kaplan-Meier analysis of survival.Most importantly, while the classifications generated were correlatedwith other (known) prognostic factors, such as IPSS score and geneticcategory, multivariate and subgroup analysis showed that they hadsignificant independent prognostic power complementary to the existingprognostic factors.

In another aspect, a classifier is disclosed for predicting theprognosis of a MDS patient. The classifier includes a memory storing areference set of mass spectral data obtained from blood-based samples ofa multitude of MDS patients, such as feature values of the featureslisted in Appendix A or some subset of such features, such as 50 or 100of such features. The classifier also includes a programmed computercoded with instructions for implementing a classifier configured as acombination of filtered mini-classifiers with drop-out regularization.

In another aspect, a laboratory testing system for conducting tests onblood-based samples from MDS patients and predicting the prognosis ofthe MDS patients is disclosed. The laboratory testing system includes aMALDI-TOF mass spectrometer configured to conduct mass spectrometry on ablood-based sample from a patient by subjecting the sample to at least100,000 laser shots and acquire resulting mass spectral data, a memorystoring a reference set of mass spectral data obtained from blood-basedsamples of a multitude of MDS patients and associated class labels; anda programmed computer coded with instructions to implement a classifierconfigured as a combination of filtered mini-classifiers with drop-outregularization. The reference set of mass spectral data includes featurevalues of at least some of the m/z features listed in Appendix A. Theprogrammed computer is programmed to generate a class label for thesample associated with the prognosis of the MDS patient, such as Earlyor Late.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow-chart showing a classifier development methodology weused to create the classifiers disclosed in this document. Themethodology uses mass-spectral data associated with blood-based samplesobtained from patients with MDS.

FIGS. 2A-2D are box and whisker plots showing the results of anormalization step in the preprocessing of mass spectral data toconstruct the classifiers of this disclosure.

FIGS. 3A-3F are Kaplan-Meier plots for survival for patients with MDS(and having either RA, RAEB, RARS, or CMML, see Table 1 for thedefinitions) according to classifier label, Early or Late. The resultsare shown at each stage of the class label refinement process (i.e.,successive iterations of loop 142 of FIG. 1 ) from FIG. 3A through FIG.3F. The final classifier selected at step 144 of FIG. 1 as a majorityvote of the master classifiers is shown in the last panel FIG. 3F.

FIG. 4 is a Kaplan-Meier plot for survival for patients with either RA,RAEB or RARS by prognostic classification label, Early or Late, assignedwith the same final classifier associated with the Kaplan-Meier plot ofFIG. 3F.

FIGS. 5A-5I are Kaplan-Meier plots for survival of various prognosticsub-groups by Early and Late classifications, with FIG. 5A showing theplot for the patients with IPSS score of 0, FIG. 5B showing the plot forthe patients with IPSS score of 0.5-1, FIG. 5C showing the plot for thepatients with IPSS score of between 0 and 1; FIG. 5D showing the plotfor the patients with IPSS score >1; FIG. 5E showing the plot for “Goodkaryotype” patients; FIG. 5F showing the plot for “Gene: del5” patients;FIG. 5G showing the plot for “Gene:Intermediate” patients; FIG. 5Hshowing the plot for biomarker “sCD44 low” patients, and FIG. 5I showingthe plot for biomarker “sCD44 high” patients.

FIGS. 6A-6F are Kaplan-Meier plots for survival for MDS patients with RAaccording to classifier label, Early or Late. The results are shown ateach stage of the class label refinement process (successive iterationsof loop 142 in FIG. 1 ). The final classifier result is shown in panelFIG. 6E, and the last panel FIG. 6F compares the results for thisclassifier with the stratification obtained from the classifications forthe RA samples from the classifier trained on all MDS samples(performance results shown in FIG. 3 ).

FIGS. 7A-7F are Kaplan-Meier plots for survival for patients with RAEBaccording to classifier label, Early or Late. The results are shown ateach stage of the class label refinement process (successive iterationsof loop 142 of FIG. 1 ). The final classifier result is shown in panelFIG. 7E and the last panel FIG. 7F compares the results for thisclassifier with the stratification obtained from the classifications forthe RAEB samples from the classifier trained on all MDS samples(performance results shown in FIG. 3 ).

FIG. 8 is a schematic illustration of a laboratory testing systemincluding mass spectrometer and programmed computer configured toclassify a blood-based sample of an MDS patient to predict the prognosisof the patient.

DETAILED DESCRIPTION

A method for assessing the prognosis of an MDS patient is disclosed, andin particular a method of determining whether an MDS patient is likelyto have a relatively worse prognosis or conversely a relatively goodprognosis. The method makes use of mass spectrometry data obtained froma blood-based sample obtained from the patient. The method also makesuse of a computer configured as a classifier which operates to classifythe mass spectrometry data with the aid of a reference set ofclass-labeled mass spectrometry data obtained from a plurality ofblood-based samples from other MDS patients.

The methodology we describe in this document makes use of a MALDI-TOFmass spectrometry method in which the blood-based sample is subject toat least 100,000 laser shots. This methodology allows greater spectralinformation to be obtained from the sample than normally acquired usingstandard “dilute and shoot” MALDI-TOF methods, which typically use only˜1000 to 2000 shots. The methodology preferably make uses of theso-called “deep MALDI” mass spectrometry technique described in U.S.patent application of H. Rőder et al., Ser. No. 13/836,436 filed Mar.15, 2013, U.S. patent application publication no. 2013/0320203 assignedto the assignee of this invention, the contents of which areincorporated by reference herein. This methodology will be described insome detail in the following detailed description and the discussion ofFIG. 8 later in this document.

The method continues with a step of operating on the mass spectral datawith a programmed computer implementing a classifier. In a preferredembodiment, the classifier is implemented as a combination of filteredmini-classifiers using a regularized combination method. The classifieris referred to herein as a “CMC/D” classifier (Combination ofMini-Classifiers with Dropout regularization), and makes use of theclassifier generation method described in pending U.S. patentapplication of H. Rőder et al., Ser. No. 14/486,442 filed Sep. 15, 2014,U.S. patent application publication no. 2015/0102216 assigned to theassignee of this invention, the content of which is incorporated byreference herein. This method of generating the classifier from adevelopment set of sample data (mass spectrometry data) will bediscussed below in conjunction with FIG. 1 . The method of using theclassifier to generate a class label to predict the prognosis of the MDSpatient is described in detail below in conjunction with FIG. 8 .

In the operating step, the classifier compares the integrated intensityvalues with feature values of a reference set of class-labeled massspectral data obtained from a multitude of MDS patients and generates aclass label for the sample. This step may make use of a classificationalgorithm such as k-nearest neighbor (KNN) and select a class label bymajority vote of the nearest neighbors in a multidimensional featurespace. The class label, e.g., Early or the equivalent or Late or theequivalent, is predictive of the patient's prognosis, namely if thepatient's mass spectrum is classified with an Early class label thepatient is predicted to have a relatively poor prognosis, whereas if thepatient's mass spectrum is classified with a Late class label thepatient is predicted to have a good prognosis. The reference set ofclass-labeled mass spectral data may take the form of a training setused to develop the classifier, or may take the form of some subset ofthe training set.

It is envisioned that it would be possible to perform a classificationtest for a specific subtype of MDS, such as a RA classifier describedbelow for the RA subtype of MDS.

We also describe the mass spectrometry features (m/z ranges) which areused for classification. The use of deep MALDI mass spectrometry revealshundreds of potential features for classification (i.e., features atwhich integrated intensity values are obtained from the spectrum undertest and features for which integrated intensity values are stored fromthe reference set). In one embodiment, the integrated intensity valuesare obtained from at least 50 features listed in Appendix A, such as 50features, 100 features, 300 features, or all 318 of the features.

Our work in discovering the classifier and methodology of predicting theprognosis of an MDS patient occurred as a result of conducting massspectrometry on a set of blood-based samples from MDS patients. Thisstudy, the samples we used, and the method of generating deep MALDIspectra will be described first. Then we will describe certainprocessing steps performed on the spectra post-acquisition to arrive ata development sample set. We will then describe our methodology ofcreating the classifier, including performance characteristics of avariety of classifiers we created. Later, this document will describe arepresentative practical testing environment in which the invention canbe practiced, for example in a laboratory setting as a fee-for-service.

I. The Study, Spectral Acquisition, Post-Processing and ClassifierDevelopment

Available Samples and Clinical Data

One hundred and forty nine serum samples were available for classifierdevelopment, from patients with myelodysplastic syndrome (MDS), acutemyelogenous leukemia (AML), and healthy controls. The patients wereenrolled in the study described in Loeffler-Ragg J., et al., Serum CD44levels predict survival in patients with low-risk myelodysplasticsyndromes Crit. Rev. Oncol. Hematol. vol. 78 no. 2 pp. 150-61 (2011).One sample (sample ID 111) was hemolyzed and so could not be used.Patients were divided into seven groups according to FAB categorization:six patients presenting with AML, 17 patients with chronicmyelomonocytic leukemia (CMML), 46 patients with refractory anemia (RA),29 patients with refractory anemia with excess blasts (RAEB), 17patients with refractory anemia with excess blasts in transformation(RAEB-t), 14 patients with refractory anemia with ring sideroblasts(RARS), and 19 healthy controls. This categorization follows the nowoutdated French-American-British (FAB) classification scheme. Under thecurrent WHO categories, patients in the RAEB-t category are now alsoconsidered as having AML. For the purposes of this investigation, thehealthy patients and those classified by WHO as having AML were notconsidered.

Some of the clinical characteristics are summarized by patient group intable 2.

TABLE 2 Baseline clinical and laboratory data for the patients withineach clinical group AML CMML RA RAEB RAEB-t RARS Controls n = 6 n = 17 n= 46 n = 29 n = 17 n = 14 n = 19 Age median (range) 76 (58-89) 73(44-86) 66.5 (23-93)  69 (19-85) 63 (52-82) 68.5 (51-84) 51 (20-93)Gender male 2 9 25 18 13 7 9 female 4 8 21 11 3 7 10 Blasts median(range) 61 (32-89) 4 (1-13)  2 (0-5.5) 9 (5-19) 23 (6-20)  1 (0-4.5) —Genetic marker good karyotype 3 12 17 12 6 8 0 del5 0 0 11 6 0 1 0intermediate 0 2 4 4 3 3 0 bad karyotype 1 2 3 3 7 0 0 IPSS 0 0 4 24 0 08 0 0.5 0 4 9 6 1 4 0 1.0 0 5 6 12 0 1 0 1.5 0 1 2 1 0 0 0 2.0 1 2 0 4 20 0 2.5 4 0 0 2 3 0 0 sCD44 median (range)  872 (459-1757)  1020(570-7961)  586 (340-1210)  595 (301-1077)  774 (350-1138) 547 (360-911) 487 (325-890) sEcad median (range)  99 (63-128)  87 (36-230)  68(0.2-286) 104 (32-221)  73 (40-146) 83 (57-131)  83 (57-188) SecondaryAML No 0 14 34 11 4 12 0 Yes 6 2 6 10 9 1 0 Unknown 0 1 6 8 4 1 19 NA =not available

Spectral Acquisition

A. Sample Preparation

Samples were thawed and 3 μl aliquots of each experimental sample andquality control reference serum (a pooled sample obtained from serumfrom five healthy patients purchased from ProMedDx) spotted ontoVeriStrat© cellulose serum cards (Therapak). The cards were allowed todry for 1 hour at ambient temperature after which the whole serum spotwas punched out with a 6 mm skin biopsy punch (Acuderm). Each punch wasplaced in a centrifugal filter with 0.45 μm nylon membrane (VWR). Onehundred μl of HPLC grade water (JT Baker) was added to the centrifugalfilter containing the punch. The punches were vortexed gently for 10minutes then spun down at 14,000 rcf for 2 minutes. The flow-through wasremoved and transferred back on to the punch for a second round ofextraction. For the second round of extraction, the punches werevortexed gently for 3 minutes then spun down at 14,000 rcf for 2minutes. Twenty microliters of the filtrate from each sample was thentransferred to a 0.5 ml eppendorf tube for MALDI analysis.

All subsequent sample preparation steps were carried out in a customdesigned humidity and temperature control chamber (Coy Laboratory). Thetemperature was set to 30° C. and the relative humidity at 10%.

An equal volume of freshly prepared matrix (25 mg of sinapinic aciddissolved in 1 ml of 50% acetonitrile: 50% water plus 0.1% TFA) wasadded to each 20 μl serum extract and the mix vortexed for 30 sec. Thefirst three aliquots (2×2 μl) of sample:matrix mix were discarded intothe tube cap. Three aliquots of 2 μl sample:matrix mix were then spottedonto a polished steel MALDI target plate (Bruker Daltonics). The MALDItarget was allowed to dry in the chamber before placement in the MALDImass spectrometer.

This set of samples (148 experimental samples plus QC sample) wasprocessed for MALDI analysis in three batches. Batches one, two, andthree contained 50, 49, and 49 experimental samples plus 6 referencesample preparations, respectively. The preparations of the referencesample were added to the beginning (2 preparations), middle (2preparations), and end (2 preparations) of each of these three batches.

B. Acquisition of Mass Spectra

MALDI spectra were obtained using a MALDI-TOF mass spectrometer(Ultraflextreme from Bruker Daltonics, Bremen, Germany) equipped with a2000 Hz SmartBeam laser. Data were acquired with positive ion detectionin linear mode with the following settings: accelerating voltage set to25 kV, extraction voltage set to 23.15 kV, lens voltage set to 7 kV, andthe delayed extraction time set to 200 ns. The instrument was externallycalibrated using the Bruker Protein Standard Mix consisting of insulin,ubiquitin, cytochrome c, and myoglobin.

Eight hundred shot spectra were collected from 63 pre-defined positionsper MALDI spot (63×800×3 spots per sample) for a total of 151,200 lasershots per sample. While in this example spectra from a total of 151,200laser shots were done so that 189 (63×3) 800-shot spectra were acquired,we believe that suitable deep spectral information would be obtained aslong as good quality spectra from at least 100,000 laser shots can beaveraged. It would be possible to obtain spectra averaged from an evengreater number of shots, such as 500,000 or 1,000,000 shots, using thetechniques of the deep-MALDI patent application cited previously. Duringspectral acquisition fuzzy control for laser power was turned off. Noevaluation criteria were used during acquisition to filter out spectra.All filtering and processing of spectra was done post-acquisition.

Spectral Post-Processing

A. Averaging of Spectra to Produce One Spectrum Per Sample

There were 189 (63×3) replicate 800-shot spectra available for eachpatient acquired using deep MALDI instrument settings. The spectra werefiltered using a ripple filter to remove artificial noise resulting fromthe digital converter. The background was subtracted for the purpose offinding peaks to be used in alignment. The threshold for peak detectionwas set to a signal to noise ratio of 3. The raw spectra (no backgroundsubtraction) were then aligned using the calibration points listed intable 3. Only spectra with a minimum of 20 peaks detected and havingused 5 alignment points were considered for inclusion in the average. Anaverage for each sample was created by selecting 112 aligned replicatespectra at random resulting in an average spectrum of about 90K shots.

TABLE 3 Calibration points used to align the raw spectra prior toaveraging m/z  1  4153  2  4183  3  6433  4  6631  5  8206  6  8684  7 9133  8 11527  9 12572 10 23864 11 13763 12 13882 13 14040 14 15127 1515869 16 17253 17 18630 18 21066 19 28108 20 28316

B. Preprocessing of Averaged Spectra

The spectra were background subtracted (two windows 80,000/10,000) andnormalized using the partial ion current (PIC) windows listed in thetable below (table 4). Background subtraction of mass spectrometry isknown in the art and described in the prior patent of Biodesix, Inc.,U.S. Pat. No. 7,736,905, the content of which is incorporated byreference herein. Partial ion current normalization is also explained inthe '905 patent.

TABLE 4 Normalization windows used in pre-processing the spectra, leftand right m/z boundaries Left Right M/Z M/Z  3748  3787  5918  6038 6164  6244  7339  7518 10542 10592 12246 12352 18385 18521 22077 2288322889 23893These windows were selected with a method that protects against usingwindows that are significantly different between groups of interest(e.g., Healthy control vs MDS), which could lead to a reduction inclassification potential, and also against features that areintrinsically unstable. The entire m/z region was divided into 92 binsthat varied in size to prevent the bin boundaries from landing withinpeaks. For each m/z bin, feature values were determined for each sample.The feature values were compared using a Wilcoxon rank-sum test by thegroup comparisons listed in table 5. If the resulting p value wasbetween 0-0.1, the region was excluded from normalization. If the CV ofthe feature values (all samples) was greater than 1.0, the region wasexcluded. The 9 windows above met the requirement for all 4 groupcomparisons.

TABLE 5 Group comparisons used to test normalization window dependencyon clinical group Group Comparison 1 Age 60 or less versus more than 602 Healthy Control versus MDS 3 No Secondary AML versus Secondary AML 4Cytogenetics good versus badUsing these 9 bins as PIC normalization windows a normalization scalarwas calculated for each sample. A final comparison of groups wasperformed using the normalization scalars to ensure that the groups andthe normalization parameters used were not significantly correlated. Thebox and whisker plots of FIG. 2 demonstrate that the groups have similardistributions of normalization scalars.

The spectra were then calibrated using the calibration points listed intable 6 to remove slight differences in alignment.

TABLE 6 Calibration points used to align the Deep MALDI average spectraM/Z  1  4153  2  4185  3  6433  4  6631  5  6940  6  7563  7  7934  8 8915  9  9420 10 12862 11 13761 12 13879 13 14040 14 15128 15 15869 1617255 17 17383 18 28108 19 28316

C. Feature Definitions

Feature definitions (m/z ranges) for use in classification were selectedby viewing a subset of spectra from patients with “early” death (<12months) compared to a “late” group with long survival (>36 months). Onlypatients with CMML, RA, RAEB, or RARS were included. Left and right peakboundaries were assigned by assessing the compilation of spectra foreach feature. This process ensures the features are adequately capturedfor any individual spectrum. Feature definitions were allowed to overlapfor neighboring features. A total of 318 features were identified andcan be found in Appendix A. The feature definitions were applied to eachspectrum in the development sample set to create a feature table offeature values (integrated intensity values for each feature).

D. Analysis of Reference Samples by Batch

Six preparations of reference sample (quality control sample) wereprepared along with the experimental samples in each batch. Two of thesepreparations were plated at the beginning (rep 1 and 2), two at the end(rep 5 and 6), and 2 preparations were plated amid the experimentalsamples (rep 3 and 4). The purpose of the reference samples was toprovide a common sample in each batch that could be used to correct thebatches for expected day to day fluctuations in spectral acquisition.The reference samples were preprocessed as described above.

A set of feature definitions, specific to the reference sample andselected for their stability, was applied to the spectra. These featuredefinitions can be found in Appendix B of our prior provisionalapplication, incorporated by reference. The resulting feature table wasused only in the analysis of the reference samples. The reference samplespectra were analyzed to find two replicates that were most similar fromthe beginning and end of each batch. We compared each possiblecombination of replicates (1 and 5, 1 and 6, 2 and 5, 2 and 6) using thefunction:

A=min(abs(1−ftrval1/ftrval2),abs(1−ftrval2/ftrval1))

where ftrval1 (ftrval2) is the value of a feature for the first (second)replicate of the replicate pair. This quantity A gives a measure of howsimilar the replicates of the pair are. The average of A was calculatedacross all possible combinations of beginning and end reference sample(“SerumP2”) replicate pairs for all features. The resulting list wassorted by increasing values of A. The lowest 20 were used to determinethe most similar combinations of reference sample replicates taken fromthe beginning and ends of the batches. This process prevents the use ofan outlier replicate spectrum in the batch correction procedure. Table 7lists the features that were used to determine the most similarreplicate combinations.

TABLE 7 The 20 most stable features considering beginning and end ofbatch reference spectra replicates m/z  3365  4014  4289  4712  4783 5106  5705  5821  5907  6880  6920  6946  7472  7566  7935  8432 1034710838 11372 12960Using a cutoff of 0.2 for A, the combination with the most passingfeatures was deemed the most similar and used for batch correctionpurposes. In the case of a tie, the combination sitting in the leftmostposition of the prescribed ordered 1_5, 1_6, 2_5, 2_6 is used. If acombination was not found where at a minimum 15 of the 20 featurespassed the cutoff for a batch, then the batch would be considered afailure and would need to be re-run. In this project, all 3 batchespassed using these criteria. For each batch, the combination of mostsimilar reference spectra replicates was found. An average was createdfrom the two replicates by averaging the feature values of the tworeplicates for each feature. These average feature values were used asthe reference for each batch for the purpose of batch correction.

E. Batch Correction

Batch 1 was used as the baseline batch to correct all other batches. Thereference sample was used to find the correction coefficients for eachof the batches 2 and 3 by the following procedure.

Within each batch j (2≤j≤3), the ratio

${\hat{r}}_{i}^{j} = \frac{A_{i}^{j}}{A_{i}^{1}}$

and the average amplitude Ā_(i) ^(j)=½(A_(i) ^(j)+A_(i) ¹) are definedfor each i^(th) feature centered at (m/z)_(i), where A_(i) ^(j) is theaverage reference spectrum amplitude of feature i in the batch beingcorrected and A_(i) ¹ is the reference spectrum amplitude of feature iin batch 1 (the reference standard). It is assumed that the ratio ofamplitudes between two batches follows the dependence

r(Ā,(m/z))=(a ₀ +a ₁ ln(Ā))+(b ₀ +b ₁ ln(Ā))(m/z)+c ₀(m/z)².

On a batch to batch basis, a continuous fit is constructed by minimizingthe sum of the square residuals, Δ^(j)=Σ_(i)({circumflex over (r)}_(i)^(j)−r^(j)(a₀, a₁, b₀, b₁, c₀))², and using the experimental data of thereference sample. The features used to create this fit are only a subset(described in Appendix C, table C.1, of our prior provisionalapplication, incorporated by reference) of the whole available set, fromwhich features known to be have poor reproducibility were removed. Stepswere taken to not include outlier points in order to avoid bias in theparameter estimates. The values of the coefficients a₀, a₁, b₀, b₁ andc₀, obtained for the different batches are listed in Appendix C of theprior provisional application (table C.2). The projection in the{circumflex over (r)}_(i) ^(j) versus (m/z)_(i) plane of the points usedto construct the fit for each batch of reference spectra, together withthe surface defined by the fit itself, is shown in figure C.1 ofAppendix C of our prior provisional application.

Once the final fit, r^(j)(Ā, (m/z)), is determined for each batch, thenext step is to correct, for all the samples, all the features (withamplitude A at (m/z)) according to

$A_{corr} = {\frac{A}{r^{j}\left( {A,\left( {m/z} \right)} \right)}.}$

After this correction, the corrected (A_(i) ^(j), (m/z)_(i), {circumflexover (r)}_(i) ^(j)) feature values calculated for reference spectra liearound the horizontal line defined by r=1, as shown in figure C.2 ofAppendix C of our prior provisional application.

The mass spectrometry data set (feature table) for each of theblood-based samples and resulting from the above pre-processing steps isreferred to as development sample set 100 in FIG. 1 and was used toconstruct a number of different classifiers. The process of using thisdevelopment sample set to generate classifiers will be described in thefollowing section.

CMC/D Classifier Generation Method

The new classifier development process using the method of combinationof mini-classifiers (mCs) with dropout (CMC/D) is shown schematically inFIG. 1 . The steps in this process are explained in detail below. Themethodology, its various advantages, and several examples of its use,are explained in great detail in U.S. patent application Ser. No.14/486,442 filed Sep. 15, 2014, the content of which is incorporated byreference. A brief explanation of the methodology will be provided herefirst, and then illustrated in detail in conjunction with FIG. 1 for thegeneration of the MDS prognosis classifiers.

In contrast to standard applications of machine learning focusing ondeveloping classifiers when large training data sets are available, thebig data challenge, in bio-life-sciences the problem setting isdifferent. Here we have the problem that the number (n) of availablesamples, arising typically from clinical studies, is often limited, andthe number of attributes (p) per sample usually exceeds the number ofsamples. Rather than obtaining information from many instances, in thesedeep data problems one attempts to gain information from a deepdescription of individual instances. The present methods take advantageof this insight, and are particularly useful, as here, in problems wherep>>n.

The method includes a first step a) of obtaining measurement data forclassification from a multitude of samples, i.e., measurement datareflecting some physical property or characteristic of the samples. Thedata for each of the samples consists of a multitude of feature values,and a class label. In this example, the data takes the form of massspectrometry data, in the form of feature values (integrated peakintensity values at a multitude of m/z ranges or peaks) as well as alabel indicating some attribute of the sample (patient Early or Latedeath). In this example, the class labels were assigned by a humanoperator to each of the samples after investigation of the clinical dataassociated with the sample. The development sample set is then splitinto a training set and a test set and the training set is used in thefollowing steps b), c) and d).

The method continues with a step b) of constructing a multitude ofindividual mini-classifiers using sets of features from the samples upto a pre-selected feature set size s (s=integer 1 . . . n). For examplea multiple of individual mini- or atomic classifiers could beconstructed using a single feature (s=1), or pairs of features (s=2), orthree of the features (s=3), or even higher order combinationscontaining more than 3 features. The selection of a value of s willnormally be small enough to allow the code implementing the method torun in a reasonable amount of time, but could be larger in somecircumstances or where longer code run-times are acceptable. Theselection of a value of s also may be dictated by the number ofmeasurement data values (p) in the data set, and where p is in thehundreds, thousands or even tens of thousands, s will typically be 1, or2 or possibly 3, depending on the computing resources available. Themini-classifiers execute a supervised learning classification algorithm,such as k-nearest neighbors, in which the values for a feature or pairsof features of a sample instance are compared to the values of the samefeature or features in a training set and the nearest neighbors (e.g.,k=5) in an s-dimensional feature space are identified and by majorityvote a class label is assigned to the sample instance for eachmini-classifier. In practice, there may be thousands of suchmini-classifiers depending on the number of features which are used forclassification.

The method continues with a filtering step c), namely testing theperformance, for example the accuracy, of each of the individualmini-classifiers to correctly classify at least some of the multitude ofsamples, or measuring the individual mini-classifier performance by someother metric (e.g. the difference between the Hazard Ratios (HRs)obtained between groups defined by the classifications of the individualmini-classifier for the training set samples) and retaining only thosemini-classifiers whose classification accuracy, predictive power, orother performance metric, exceeds a pre-defined threshold to arrive at afiltered (pruned) set of mini-classifiers. The class label resultingfrom the classification operation may be compared with the class labelfor the sample known in advance if the chosen performance metric formini-classifier filtering is classification accuracy. However, otherperformance metrics may be used and evaluated using the class labelsresulting from the classification operation. Only those mini-classifiersthat perform reasonably well under the chosen performance metric forclassification are maintained. Alternative supervised classificationalgorithms could be used, such as linear discriminants, decision trees,probabilistic classification methods, margin-based classifiers likesupport vector machines, and any other classification method that trainsa classifier from a set of labeled training data.

To overcome the problem of being biased by some univariate featureselection method depending on subset bias, we take a large proportion ofall possible features as candidates for mini-classifiers. We thenconstruct all possible KNN classifiers using feature sets up to apre-selected size or depth (parameters). This gives us many“mini-classifiers”: e.g. if we start with 100 features for each sample(p=100), we would get 4950 “mini-classifiers” from all differentpossible combinations of pairs of these features (s=2), 161,700mini-classifiers using all possible combination of three features (s=3),and so forth. Other methods of exploring the space of possiblemini-classifiers and features defining them are of course possible andcould be used in place of this hierarchical approach. Of course, many ofthese “mini-classifiers” will have poor performance, and hence in thefiltering step c) we only use those “mini-classifiers” that passpredefined criteria. These criteria are chosen dependent on theparticular problem: If one has a two-class classification problem, onewould select only those mini-classifiers whose classification accuracyexceeds a pre-defined threshold, i.e., are predictive to some reasonabledegree. Even with this filtering of “mini-classifiers” we end up withmany thousands of “mini-classifier” candidates with performance spanningthe whole range from borderline to decent to excellent performance.

The method continues with step d) of generating a master classifier bycombining the filtered mini-classifiers using a regularized combinationmethod. In one embodiment, this regularized combination method takes theform of repeatedly conducting a logistic training of the filtered set ofmini-classifiers to the class labels for the samples. This is done byrandomly selecting a small fraction of the filtered mini-classifiers asa result of carrying out an extreme dropout from the filtered set ofmini-classifiers (a technique referred to as drop-out regularizationherein), and conducting logistical training on such selectedmini-classifiers. While similar in spirit to standard classifiercombination methods (see e.g. S. Tulyakov et al, Review of ClassifierCombination Methods, Studies in Computational Intelligence, Volume 90,2008, pp. 361-386), we have the particular problem that some“mini-classifiers” could be artificially perfect just by random chance,and hence would dominate the combinations. To avoid this overfitting toparticular dominating “mini-classifiers”, we generate many logistictraining steps by randomly selecting only a small fraction of the“mini-classifiers” for each of these logistic training steps. This is aregularization of the problem in the spirit of dropout as used in deeplearning theory. In this case, where we have many mini-classifiers and asmall training set we use extreme dropout, where in excess of 99% offiltered mini-classifiers are dropped out in each iteration.

In more detail, the result of each mini-classifier is one of two values,either “Early” or “Late” in this example. We can then use logisticregression to combine the results of the mini-classifiers in the spiritof a logistic regression by defining the probability of obtaining an“Early” label via standard logistic regression (see e.g. the Wikipediapage on logistic regression).

$\begin{matrix} & {{Eq}.(1)}\end{matrix}$${P\left( {\text{“Early”}{❘{{feature}{for}a{spectrum}}}} \right)} = \frac{\exp\left( {\sum{\underset{{mini}{classifiers}}{w_{mc}}{I\left( {{mc}\left( {{feature}{values}} \right)} \right)}}} \right)}{Normalization}$

where I(mc(feature values))=1, if the mini-classifier me applied to thefeature values of a sample returns “Early”, and 0 if the mini-classifierreturns “Late”. The weights for each of the miniClassifiers, w_(mc), areunknown and need to be determined from a regression fit of the aboveformula for all samples in the training set using +1 for the left handside of the formula for the Early-labeled samples in the training set,and 0 for the Late-labeled samples, respectively. As we have many moremini-classifiers, and therefore weights, than samples, typicallythousands of mini-classifiers and only tens of samples, such a fit willalways lead to nearly perfect classification, and can easily bedominated by a mini-classifier that, possibly by random chance, fits theparticular problem very well. We do not want our final test to bedominated by a single special mini-classifier which only performs wellon this particular set and is unable to generalize well. Hence wedesigned a method to regularize such behavior: Instead of one overallregression to fit all the weights for all mini-classifiers to thetraining data at the same, we use only a few of the mini-classifiers fora regression, but repeat this process many times in generating themaster classifier. For example we randomly pick three of themini-classifiers, perform a regression for their three weights, pickanother set of three mini-classifiers, and determine their weights, andrepeat this process many times, generating many random picks, i.e.realizations of three mini-classifiers. The final weights defining theCMC/D master classifier are then the averages of the weights over allsuch realizations. The number of realizations should be large enoughthat each mini-classifier is very likely to be picked at least onceduring the entire process. This approach is similar in spirit to“drop-out” regularization, a method used in the deep learning communityto add noise to neural network training to avoid being trapped in localminima of the objective function.

Other methods for performing the regularized combination in step (d)that could be used include:

-   -   Logistic regression with a penalty function like ridge        regression (based on Tikhonov regularization, Tikhonov, Andrey        Nikolayevich (1943).        [On the stability of inverse problems]. Doklady Akademii Nauk        SSSR 39 (5): 195-198.)    -   The Lasso method (Tibshirani, R. (1996). Regression shrinkage        and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58,        No. 1, pages 267-288).    -   Neural networks regularized by drop-out (Nitish Shrivastava,        “Improving Neural Networks with Dropout”, Master's Thesis,        Graduate Department of Computer Science, University of Toronto;        available on line from the computer science department website        of the University of Toronto, link set forth in on prior        provisional application.    -   General regularized neural networks (Girosi F. et al, Neural        computation, (7), 219 (1995)). The above-cited publications are        incorporated by reference herein. Our approach of using drop-out        regularization has shown promise in avoiding over-fitting, and        increasing the likelihood of generating generalizable tests,        i.e. tests that can be validated in independent sample sets.

In step e) of the method, the set of samples are randomly separated intoa test set and a training set, and the steps b)-d) are repeated in theprogrammed computer for different realizations of the separation of theset of samples into test and training sets, thereby generating aplurality of master classifiers, one for each realization of theseparation of the set of samples into training and test sets.

The method continues with step f) of defining a final classifier fromone or a combination of more than one of the plurality of masterclassifiers. In the present example, the final classifier is defined asa majority vote of all the master classifiers resulting from eachseparation of the sample set into training and test sets, and if thesample is in the development set, the majority vote of all the masterclassifiers resulting from each separation of the sample set intotraining and test when the sample is not in the training set.

With reference now to FIG. 1 , we have a development sample set 100, inthis case the mass spectrometry data of the patients provided in thestudy.

Step 102 Definition of Initial Class Labels

For the purposes of developing a prognostic classifier able to identifypatients with better or worse survival after presentation with MDS, theinitial class labels were assigned based on short or long survival(class label “Early”=early death, class label “Late”=long survival),with the “Early” group 104 initially composed of patients with death orcensoring at or before 24 months and the “Late” group 106 composed ofpatients with death or censoring after 24 months. Only the 101 patientswith survival data available were used in this classifier developmentapproach.

Steps 120, 122 Creation and Filtering of Mini-Classifiers

Once the initial definition of the class labels has been established,the development set (100) is split into training and test sets at step108. In step 120, many k-nearest neighbor (kNN) mini-classifiers (mCs)that use the training set as their reference set are constructed usingsubsets of features from the 318 mass spectral features identified (seeAppendix A). For many of the investigations all possible single featuresand pairs of features were examined (s=2); however, when fewer featureswere used, triplets were also considered (s=3). For the 318 massspectral features, just traversing all single features and pairs offeatures amounts to considering 50,721 possible mCs. The parameters usedto traverse the space of mCs for this project are listed in table 8.

TABLE 8 Parameters used to create mCs kNN parameters k 5 (except whenotherwise indicated)Furthermore, we used all pairs of features and all single features for“2-deep” mini-classifiers (s=2), and all single features, all pairs offeatures and all triples of features for “3-deep” mini-classifiers(s=3).

At step 126, to target a final classifier that has certain performancecharacteristics, these mCs are filtered. Each mC is applied to itstraining set (112) and performance metrics are calculated from theresulting classifications of the training set. Only those mCs thatsatisfy thresholds on these performance metrics pass filtering to beused further in the process. The mCs that fail filtering are discarded.For this project, hazard ratio filtering was used, i.e. the classifierwas applied to a set of samples (such as the training set or a subset ofthe patients without liver disease) and the hazard ratio for survivalbetween the groups defined by the resulting classification had to liewithin a preset range for the mC to pass filtering. The filteringoptions used in this project are listed in Table 9. We also triedaccuracy filtering and found it produced inferior results.

Step 130 and 132 Master Classifier as a Combination of Mini-ClassifiersUsing Logistic Regression with Dropout

Once the filtering of the mCs is complete, the mCs are combined in onemaster classifier (MC) using a logistic regression trained using thetraining set labels. To help avoid overfitting the regression isregularized using extreme drop out. Most of the CMC/D approaches in thisstudy randomly selected 10 of the mCs for inclusion in each logisticregression iteration. The number of dropout iterations was selectedbased on the typical number of mCs passing filtering for each approachto ensure that each mC was likely to be included within the drop outprocess multiple times.

Step 134 Analysis of Master Classifier Performance and Training and TestSplits (Loop 135)

At step 134, the performance of the MC generated at step 130 is thentested by subjecting the test set split (110) to classification by theMC. The performance of the MC is then evaluated at step 134.

The split of the class groups into training and test sets at step 108 isperformed many times using a stratified randomization, as indicated bythe loop 135. Each training/test split produces a MC which can beapplied to the split test set (110) at step 134 to assess performance.The use of multiple training/test splits avoids selection of a single,particularly advantageous or difficult, training set for classifiercreation and avoids bias in performance assessment from testing on atest set that could be especially easy or difficult to classify.

One other advantage of these multiple training/test splits is that itallows for the refinement of the initial assignment for the class groups(“Early”/“Late”) when these are not known definitively. For example,when one tries to split a patient cohort into two groups with better(“Late”) and worse (“Early”) time-to-event outcome, it is generally notclear, a priori, which patients will be in which class, as each classwill typically display a range of outcomes and these usually overlap.(That is, it is very likely that there are patients in the “Early” groupwho have longer time-to-event values than some patients in the “Late”group.) When the class definitions are uncertain, the CMC/D approach asshown in FIG. 1 allows one to make an initial guess for the class labelassignments (at step 102) and then refine this through a series ofiterations of the process (see FIG. 1 , loop 142).

At step 136, the classifier performance data is analyzed for eachtraining/test set split, which is performed by obtaining performancecharacteristics for the MCs for each training/test set split and theclassification results.

As indicated at 140, a check is made to see if any samples persistentlymisclassify and if so those samples that persistently misclassify havetheir class label flipped (Early->Late; Late->Early) and the processrepeats as indicated by loop 142.

In particular, for the training/test splits where a particular samplefrom the defined groups (102), i.e. the union of samples in Class 1(Early) (104) and Class 2 (Late) (106) is in the test set (110), theresulting classifications for the sample can be obtained by applying therespective MCs. If the sample persistently misclassifies relative to theinitial guess for patient prognosis class, the sample can be moved fromthe better outcome class into the worse outcome class, or vice versa.Carrying out this procedure of checking the classifications and flippingthe class when there are persistent misclassifications for all sampleswith defined class labels (102) produces a new, refined version of thegroup definitions which is the starting point for a second iteration ofthe CMC/D process as indicated by loop 142 looping back to step 102.This refinement process can be iterated so that the better/worseprognosis classes are determined at the same time as a classifier isconstructed. Each approach to this project based on survival outcomeinvolved several rounds of these reference class label swaps.

The output of the logistic regression that defines each MC generated atstep 130 is a probability of being in one of the two training classes.These MC outputs can be combined to make one resultant classifier inseveral ways.

-   -   Applying a cutoff (e.g. 0.5) to these probabilities, one can        generate a binary classification label for a sample from        each MC. These labels can then be combined in a majority vote to        obtain one binary classification for a sample. When analyzing        the performance of the classifier in the development set (at        step 134, 138), it is helpful to use a modified majority vote        for samples which are used in training the classifier. For        samples which are used in the training set of some of the        training/test set split realizations, the modified majority        vote (MMV) is defined as the majority vote of the MC labels over        the MCs which do not have the sample in the training set. For        samples which are never used in any training set, the modified        majority vote and majority vote are identical.    -   The MC probabilities can be averaged to yield one average        probability for a sample. When working with the development set,        this approach can also be adjusted to average over MCs for which        a given sample is not included in the training set, in an        analogous way to the MMV procedure. These average probabilities        can be used as the output of a classifier or a threshold can be        applied to convert them into a binary classification.

The present CMC/D method works best when the two classes in the trainingset are of approximately equal sizes. To achieve this it may benecessary to sample the classes at different rates. In addition,performance has been seen to deteriorate quickly when the size of thetraining sets drops very low. When there are small numbers in one of thetraining classes, it can be advantageous to include most of the samplesin the kNN reference set in each realization, leaving only a few samplesas a test set. This process still works well providing the number oftraining/test set split realizations is scaled up to allow for adequatestatistics for all samples when they are in the test sets of therealizations.

Many implementations of the CMC/D process were investigated, varying inthe population used for the test/training splits, the filtering used inthe CMC/D process, and the feature space explored.

Some of these approaches involved feature selection within the sets ofmass spectral features, which was done by choosing the features with thelowest p-values for a t-test between the two class definitions for eachround of CMC/D.

A summary of some of the approaches tried during new classifierdevelopment using the standard CMC/D workflow and the first set ofdefined features is presented in Table 9. Table 9 summarizes theapproaches that made classifiers based on better and worse survival.

TABLE 9 Approaches to CMC/D used for identification of classifier basedon better or worse survival Depth (# features Patient Features in kNNFiltering Options subset Used mCs) s (default k = 5) All patients 318 20.75 < training set with accuracy < 0.95 survival 318 2 3 < HR < 10 data100 selected 2 3 < HR < 10 (n = 101) by t-test 100 selected 2 3 < HR <10with sCD44 by t-test and sEcad as features 100 selected 2 3 < HR < 10run with by t-test k = 9 RA patients 100 selected 2 3 < HR < 10 withsurvival by t-test data (n = 45) RAEB 100 selected 2 3 < HR < 10patients with by t-test survival data (n = 26)Note that we explored generation of classifiers with either 100 or all318 of the features listed in Appendix A. It would be possible to useother numbers of features, e.g., selected from t-statistic or selectedby some other statistical method for classification power, such as 50,150 or some other number of features. Preferably at least 50 featuresare used in order to take advantage of the deep spectral informationobtained by subjecting the samples to at least 100,000 laser shots.

Results

The performance of the classifiers based on better or worse survival wasassessed using the Kaplan-Meier survival curves for the resultingclassifications as defined by modified majority vote (out of bagestimate). For this problem, the best classifier performance wasobtained using the subset of 100 features of Appendix A selected basedon lowest p value from a t-test between the initial class definitions ofEarly (poor prognosis) and Late (good prognosis). FIG. 3 shows theKaplan-Meier plots for the survival of Early and Late groups along theiterations of the CMC/D process as the class definitions are refinedduring successive iterations of loop 142. All patients with survivaldata and MDS (RA, RAEB, RARS or CMML) are included. The panel FIG. 3Fshows the result at convergence of the class labels and is the resultexpected to generalize best to unseen sample sets. For this approach Kin the K-nearest neighbor algorithm=5 and hazard ratio filtering of mCswere used. Note that in the figures the plots show a clear separation ofsurvival curves between the patients in the Early and Late groups.

For the final classifier (FIG. 3F) the hazard ratio between Early andLate groups is 0.34 with 95% confidence interval (CI) of 0.18-0.49,log-rank p value <0.001. The median survival is 13 months (95% CI: 9-24months) in the Early group compared with 53 months (95% CI: 37-75months) in the Late group. The classifications assigned to each sampleare listed in Appendix D of our prior provisional application. Weexplored generating classifiers that used larger values of K in the KNNalgorithm used by the mC in FIG. 1 , and classifiers using accuracyfiltering instead of hazard ratio filtering in step 126 of FIG. 1 , butboth approaches produced inferior results.

As patients classified with CMML (see Table 1) are now sometimes nolonger classified as having MDS, the final classifier of FIG. 3F wasalso analyzed within the subset of patients with only RA, RAEB or RARS.The result is shown in FIG. 4 . This figure also shows the clearseparation in the survival curves between the Early and Late groups.Within this patient population the hazard ratio between Early and Lategroups is 0.37 (95% CI: 0.19-0.57), log-rank p value <0.001. The mediansurvival is 23 months in the Early group compared with 52 months in theLate group. The distribution of Early and Late groups by FABclassification is shown in table 10.

TABLE 10 Distribution of Early and Late groups by FAB classification(all patients with or without survival data) FAB classification EarlyLate AML  5  1 RAEB-t  9  8 CMML 11  6 RA 20 26 RAEB 17 12 RARS  7  7Healthy  9 10

NB: the fact that some of the healthy patients were classified as Earlycould be due to a variety of factors, including a) we could be measuringsomething that could also be present in non-MDS patients, for examplethe level at which a person's immune system functions; and b) thefeature values of healthy patients are so different compared to the restof the training set that when you try to classify a healthy patient youget essentially a random answer back because the features lie way off infeature space compared with the MDS patients, and so asking which MDSpatient is a nearest neighbor is practically meaningless and the resultmore or less random.

Possible correlations between the classifier labels, Early and Late, andknown prognostic factors, such as IPSS score and karyotype wereinvestigated. The breakdown of Early and Late groups by other prognosticfactors is shown in table 11.

TABLE 11 Distribution of Early and Late groups by other prognosticfactors (patients with RA, RAEB, CMML, RARS and survival data only, i.e.the population in the Kaplan-Meier plots) Early group Late group Factor(N = 51) (N = 50) P value IPSS Score 0 11 24  0.002* 0.5 13 10 1.0 14 91.5 4 0 2.0 5 0 2.5 0 2 IPSS Score Low risk 11 24 <0.001* Intermediate 127 19 risk 9 0 Intermediate 2 0 2 risk High risk Genetic Poor 7 0 0.060* Category Del5 8 9 intermediate 6 6 Good 23 25 Age median 70 72 0.676{circumflex over ( )} mean 68.8 67.6  0.658 ^(x) range 38-93 19-86sCD44 § median 707 530 <0.001{circumflex over ( )} mean 973 558  0.009^(x) range  301-7961 340-978 <0.001* High 22 3 Low 29 47 *Fisher’s exacttest; {circumflex over ( )}Mann-Whitney test; ^(x) t-test sCD44 is theserum biomarker described in Loeftler-Ragg J., et al. Serum CD44 lewispredict survival in patients with low-risk myelodysplaslic syndromesCrit. Rev. Oncol. Hematol, vol. 78 no.2 pp. 150-61 (2011)

There are significant correlations between Early and Late groups withIPSS score or a trend to significance with gene category. There is alsoa strong correlation between sCD44 category (high versus low) and Earlyor Late label.

To investigate any independent prognostic significance of the Early andLate classification labels, further analysis was carried out using Coxproportional hazards models stratified by FAB category. As a univariatefactor in this stratified analysis, Early/Late classification is highlysignificant with hazard ratio 0.30 (95% CI: 0.18-0.51), p<0.001. Inmultivariate analysis, the classification labels of Early and Lateremain significant predictors of outcome, even when adjusted for IPSSscore and genetic category (hazard ratio=0.40 (95% CI: 0.21-0.74),p=0.004). IPSS score and “bad” karyotype gene classification are alsosimultaneously significant predictors of survival. On adjustment bysCD44 category, as defined in Loeffler-Ragg et al. (see above-citedpaper), either of Early/Late classification alone or of the combinationof Early/Late classification, IPSS score, and genetic classification,Early/Late label remains an independent predictor of survival. Theseresults indicate that, despite the clear correlations between Early/Latelabel and other prognostic factors, the Early/Late label is asignificant predictor of survival independent of other prognosticfactors. Further details of the multivariate analyses are included inAppendix E in our prior provisional application, incorporated byreference.

Subgroup analyses supported these conclusions that Early/Late labelprovides prognostic information additional and complementary to otherexisting, available prognostic factors. Kaplan-Meier plots of survivalfor various subgroups by Early and Late classification are shown in FIG.5 . FIG. 5A shows the plot for the patients with IPSS score of 0, FIG.5B shows the plot for the patients with IPSS score of 0.5-1, FIG. 5Cshows the plot for the patients with IPSS score of between 0 and 1; FIG.5D shows the plot for the patients with IPSS score >1; FIG. 5E shows theplot for “Good karyotype” patients; FIG. 5F shows the plot for “Gene:del5” patients; FIG. 5G shows the plot for “Gene:Intermediate” patients;FIG. 5H shows the plot for biomarker “sCD44 low” patients, and FIG. 5Ishows the plot for biomarker “sCD44 high” patients. Note that in all ofthese plots, there is a clear separation of the survival curves forEarly and Late labeled patients.

It is apparent from FIGS. 5A-5I that the classifier label is able tostratify IPSS low risk patients (FIG. 5C) and although there are veryfew IPSS high risk patients with Late classification, they haverelatively very good outcomes (FIG. 5D). Within the gene-definedsubtypes good, del5 and intermediate, the Early subgroup has inferioroutcomes to that of the Late subgroup. The bad karyotype group onlycontains patients classified to the Early group (Kaplan-Meier plot forthis subgroup not shown).

While there was strong correlation between Early and Lateclassifications and sCD44 level, the classification Early and Late stillclearly stratifies the sCD44 low group (see FIG. 5H), therefore theclassification label Early or Late adds additional information to thatobtained solely from a sCD44 test. Only three patients in the sCD44 highgroup are classified as Late (FIG. 5I) and these all have survival of atleast 26 months.

In addition to creating classifiers trained on patients in all MDSsubgroups, separate classifiers were also developed using the samemethodology within the two largest MDS subgroups, RA and RAEB (see Table1 definition). The results from each refinement of class labels areshown for the RA classifier in FIG. 6 and for the RAEB classifier inFIG. 7 . In particular, FIGS. 6A-6F are Kaplan-Meier plots for survivalfor MDS patients with RA according to classifier label, Early or Late.The results are shown at each stage of the class label refinementprocess (successive iterations of loop 142 in FIG. 1 ). The finalclassifier result is shown in panel FIG. 6E, and the last panel FIG. 6Fcompares the results for this classifier with the stratificationobtained from the classifications for the RA samples from the classifiertrained on all MDS samples (shown in FIG. 3 ).

For the final RA classifier (FIG. 6E) the hazard ratio between Early andLate groups is 0.32 (95% CI: 0.14-0.67), log-rank p value 0.005. Themedian survival is 25 months (95% CI: 13-32 months) in the Early groupcompared with 63 months (95% CI: 37-75 months) in the Late group. Thereare no patient deaths in the Late group before 30 months, by which timepatient survival in the Early group is less than 50%. Theclassifications assigned to each RA sample by this classifier are listedin Appendix D of our prior provisional application which is incorporatedby reference. The final panel FIG. 6F compares the classificationsobtained for the RA samples between this classifier and the classifiertrained on the set of all MDS samples. Although the difference betweenthe two classifiers is not great, the RA-developed classifier assignsfewer patients to the Late group; in particular two relatively earlyevents and some patients who are censored at early times are in the Lategroup for the all MDS classifier, but the Early group for the RAclassifier. Hence, the RA classifier may be preferable in this subgroupfor identifying very good prognosis patients.

FIGS. 7A-7F are Kaplan-Meier plots for survival for patients with RAEBaccording to classifier label, Early or Late. The results are shown ateach stage of the class label refinement process (successive iterationsof loop 142 of FIG. 1 ). The final classifier result is shown in panelof FIG. 7E and the last panel FIG. 7F compares the results for thisclassifier with the stratification obtained from the classifications forthe RAEB samples from the classifier trained on all MDS samples (shownin FIG. 3 ).

For the final RAEB classifier (FIG. 7E) the hazard ratio between Earlyand Late groups is 0.35 (95% CI: 0.12-0.66), log-rank p value 0.009. Themedian survival is 16 months (95% CI: 2-26 months) in the Early groupcompared with 39 months (95% CI: 12-85 months) in the Late group. Theclassifications assigned to each RAEB sample by this classifier arelisted in Appendix D of our prior provisional application. The panel 7Fcompares the classifications obtained for the RAEB samples between thisclassifier and the classifier trained on the set of all MDS samples(FIG. 3 ). Similarly to the RA case, the RAEB-trained classifier assignsslightly fewer patients to the good prognosis group. However, in thiscase there is little difference in practical performance.

Conclusions

Classifiers were constructed with the ability to stratify MDS patientsinto those with better or worse survival. The classifiers perform well,as indicated in the Kaplan-Meier plots of FIGS. 3, 4, 5, 6 and 7 . Thegroupings demonstrated a large effect size between groups inKaplan-Meier analysis of survival. Most importantly, while theclassifications generated were correlated with other prognostic factors,such as IPSS score and genetic category, multivariate and subgroupanalysis showed that they had significant independent prognostic powercomplementary to the existing prognostic factors. In particular, theclassifications obtained showed clear power to stratify patientstypically defined as low risk under existing criteria and, within therefractory anemia population, it was able to identify a group ofpatients who experienced no deaths within 30 months of follow up. Whilethese results have not yet been validated on an independent sample set,the classifiers of this disclosure appear to be of clinical relevance asan extra factor in assessing prognosis for MDS patients and guidingtreatment of such patients.

II. Laboratory Test Center and Computer Configured as Classifier

FIG. 8 is an illustration of a laboratory testing center or system forprocessing a test sample (in this example a blood-based sample from anMDS patient) using a classifier generated in accordance with FIG. 1 andgenerating a prognostic label for the patient. The system includes amass spectrometer 806 and a general purpose computer 810 having CPU 812implementing a CMC/D classifier 820 coded as machine-readableinstructions and a reference mass spectral data set including a featuretable 822 of class-labeled mass spectrometry data stored in memory 814.It will be appreciated that the mass spectrometer 806 and computer 810of FIG. 8 could be used to generate the CMC/D classifier 820 inaccordance with the process of FIG. 1 .

The operation of the system of FIG. 8 will be described in the contextof a predictive test for prognosis of an MDS patient. The followingdiscussion assumes that the CMC/D classifier 820 is already generated atthe time of use of the classifier to generate a label or panel of labelsfor a test sample.

The system of FIG. 8 obtains a multitude of samples 800, e.g.,blood-based samples (serum or plasma) from diverse MDS patients andgenerates a label or panel of labels as a fee-for-service. The samples800 are used by the classifier (implemented in the computer 810) to makepredictions as to the prognosis of a MDS patient. The outcome of thetest is a binary class label (or panel of such labels), such as LowRisk, Late or the like, or High Risk, Early, or the like. The particularmoniker for the class label is not particularly important and could begeneric such as “class 1”, “class 2” or the like, but as noted earlierthe class label is associated with a clinical attribute relevant to thequestion being answered by the classifier, in this case, prognosis.

The samples may be obtained on serum cards or the like in which theblood-based sample is blotted onto a cellulose or other type card.Aliquots of the sample are spotted onto several spots of a MALDI-ToFsample “plate” 802 and the plate inserted into a MALDI-ToF massspectrometer 806. The mass spectrometer 806 acquires mass spectra 808from each of the spots of the sample. The mass spectra are representedin digital form and supplied to a programmed general purpose computer810. The computer 810 includes a central processing unit 812 executingprogrammed instructions. The memory 814 stores the data representing themass spectra 808. The spectral acquisition details, including deep-MALDI(100,000+laser shots) and spectra processing that was used in classifiergeneration (described at length above) is also used for a test sample.

The memory 814 also stores a final CMC/D classifier 820, which includesa) a reference mass spectral data set 822 in the form of a feature tableof N class-labeled spectra, where N is some integer number, in thisexample the development set used to develop the classifier as explainedabove or some sub-set of the development sample set. The final CMC/Dclassifier includes b) code 824 representing a KNN classificationalgorithm (which is implemented in the mini-classifiers as explainedabove), c) program code 826 for executing the final classifier generatedin accordance with FIG. 1 on the mass spectra of patients, includinglogistic regression weights and data representing master classifier(s)forming the final classifier, and d) a data structure 828 for storingclassification results, including a final class label for the testsample. The memory 814 also stores program code 830 for implementing theprocessing shown at 850, including code (not shown) for acquiring themass spectral data from the mass spectrometer in step 852; apre-processing routine 832 for implementing the background subtraction,normalization and alignment step 854 (details explained above),filtering and averaging of the 800 shot spectra at multiple locationsper spot and over multiple MALDI spots to make a single 100,000+shotaverage spectrum (as explained above), a module (not shown) forcalculating integrated intensity values at predefined m/z positions inthe background subtracted, normalized and aligned spectrum (step 856),and a code routine 838 for implementing the final classifier 820 usingthe reference dataset 822 on the values obtained at step 856. Theprocess 858 produces a class label at step 860. The module 840 reportsthe class label as indicated at 860 (i.e., “Early”, “Late” or theequivalent). As explained previously, the classifier 820 may bereplicated so as to constitute different classifiers, e.g., to generatea panel of classification results, each one using the same feature table822, and KNN classification algorithm 824.

The program code 830 can include additional and optional modules, forexample a feature correction function code 836 (described in co-pendingU.S. patent application Ser. No. 14/486,442) for correcting fluctuationsin performance of the mass spectrometer, a set of routines forprocessing the spectrum from a reference sample to define a featurecorrection function, a module storing feature dependent noisecharacteristics and generating noisy feature value realizations andclassifying such noisy feature value realizations, modules storingstatistical algorithms for obtaining statistical data on the performanceof the classifier on the noisy feature value realizations, or modules tocombine class labels defined from multiple individual replicate testingof a sample to produce a single class label for that sample. Still otheroptional software modules could be included as will be apparent topersons skilled in the art.

The system of FIG. 8 can be implemented as a laboratory test processingcenter obtaining a multitude of patient samples from oncologists,patients, clinics, etc., and generating a class label for the patientsamples as a fee-for-service. The mass spectrometer 806 need not bephysically located at the laboratory test center but rather the computer810 could obtain the data representing the mass spectra of the testsample over a computer network.

Further Considerations

It will be noted that the classifier we generated uses the features ofAppendix A (or some subset, such as 100 features selected byt-statistic) and we have not determined precisely what proteins thesepeaks correspond to. Nor is it necessary. What matters is classifierperformance. We believe that they may involve, directly or indirectly,one or more of the protein biomarkers mentioned in the scientificliterature cited at the beginning of this document. Note that, with our“deep MALDI” mass spectrometry and the use of 50, 100 or even 300 ormore peaks, it is likely that our classifiers are based on stillundiscovered protein biomarkers circulating in serum. Our methodessentially takes advantage of the fact that we can detect theseproteins, and in particular low abundance proteins, using the >100,000shot MALDI-TOF mass spectra, and use them in development of aclassifier, even though we do not know precisely what proteins the peakscorrespond to.

The following claims are offered as further description of the disclosedinventions.

Appendix A: Feature definitions Left m/z Center Right m/z  3033  3043 3054  3076  3086  3097  3099  3109  3119  3120  3130  3140  3148  3155 3162  3157  3168  3178  3176  3186  3197  3192  3201  3209  3210  3220 3230  3231  3241  3252  3255  3264  3274  3275  3287  3299  3300  3320 3339  3357  3369  3382  3382  3395  3409  3410  3423  3436  3437  3447 3457  3457  3467  3477  3478  3487  3497  3497  3511  3525  3541  3555 3570  3570  3578  3586  3586  3596  3605  3606  3627  3649  3649  3658 3667  3668  3682  3696  3696  3706  3716  3716  3726  3737  3746  3754 3762  3764  3775  3787  3788  3796  3804  3805  3819  3833  3833  3843 3852  3885  3892  3899  3900  3908  3916  3916  3922  3929  3929  3936 3943  3943  3952  3962  3962  3971  3981  4000  4014  4028  4028  4035 4042  4042  4051  4060  4060  4069  4078  4103  4110  4116  4116  4121 4126  4276  4289  4301  4304  4315  4326  4329  4339  4350  4350  4360 4370  4371  4379  4387  4381  4391  4401  4396  4410  4425  4426  4437 4448  4449  4458  4466  4462  4472  4482  4485  4493  4500  4509  4520 4531  4540  4548  4557  4557  4568  4579  4579  4586  4592  4592  4600 4609  4616  4625  4633  4634  4644  4653  4648  4655  4662  4659  4666 4673  4699  4709  4720  4720  4731  4743  4747  4756  4764  4766  4774 4782  4783  4791  4799  4799  4809  4819  4819  4825  4831  4848  4856 4864  4865  4871  4878  4880  4892  4905  4906  4917  4927  4928  4938 4948  4949  4961  4974  5008  5018  5028  5028  5034  5041  5050  5066 5081  5067  5081  5095  5087  5104  5121  5122  5136  5150  5150  5158 5165  5166  5184  5203  5203  5212  5221  5215  5231  5247  5240  5248 5256  5275  5289  5303  5312  5325  5337  5353  5362  5371  5371  5382 5393  5386  5392  5398  5397  5408  5418  5421  5430  5439  5439  5449 5460  5467  5475  5483  5484  5492  5500  5501  5507  5513  5513  5523 5532  5541  5554  5567  5575  5587  5598  5625  5638  5651  5665  5676 5688  5688  5698  5708  5698  5709  5720  5711  5719  5727  5749  5768 5787  5802  5816  5830  5831  5855  5880  5881  5894  5906  5897  5929 5961  5924  5935  5945  5978  5988  5998  5998  6007  6017  6018  6030 6041  6050  6064  6079  6079  6088  6096  6096  6107  6118  6118  6126 6135  6135  6150  6166  6180  6194  6207  6207  6220  6232  6277  6294 6310  6322  6331  6340  6340  6352  6363  6357  6368  6379  6377  6386 6394  6394  6401  6409  6418  6433  6448  6448  6456  6465  6465  6475 6485  6487  6496  6505  6504  6512  6520  6512  6527  6542  6602  6624 6646  6647  6656  6664  6665  6675  6685  6686  6697  6708  6709  6729 6748  6786  6799  6811  6812  6830  6848  6824  6836  6848  6849  6859 6869  6869  6887  6905  6910  6919  6927  6928  6943  6958  6958  6968 6978  6978  6988  6998  7024  7049  7075  7117  7138  7160  7176  7185 7194  7190  7205  7220  7235  7247  7258  7254  7265  7276  7288  7297 7307  7344  7355  7366  7368  7386  7404  7454  7469  7485  7475  7486 7497  7493  7505  7518  7547  7565  7584  7599  7615  7631  7633  7645 7656  7657  7673  7689  7722  7737  7753  7753  7776  7799  7799  7821 7843  7917  7936  7955  7982  7995  8008  8021  8043  8066  8068  8082 8096  8117  8141  8164  8186  8203  8220  8222  8237  8252  8337  8357 8378  8399  8420  8440  8459  8476  8492  8515  8528  8541  8547  8573 8599  8613  8633  8653  8582  8651  8720  8649  8657  8665  8665  8683 8702  8750  8764  8778  8793  8809  8825  8855  8868  8880  8895  8911 8927  8922  8932  8941  8944  8958  8971  9002  9017  9032  9062  9074 9086  9101  9128  9155  9140  9167  9194  9276  9302  9328  9322  9338 9353  9345  9357  9370  9391  9441  9491  9554  9573  9592  9607  9633 9660  9687  9707  9726  9727  9745  9763  9764  9783  9802  9897  9923 9949  9931  9963  9994 10051 10069 10088 10089 10104 10120 10157 1018910221 10243 10272 10301 10323 10337 10351 10368 10389 10409 10426 1044310460 10462 10488 10514 10516 10531 10547 10562 10579 10596 10601 1064910696 10699 10727 10755 10814 10840 10865 10913 10935 10957 10983 1100411026 11026 11046 11065 11072 11098 11125 11126 11145 11164 11268 1129811328 11339 11365 11390 11391 11403 11416 11417 11436 11455 11455 1147011485 11498 11526 11554 11607 11623 11640 11651 11680 11708 11714 1172811743 11765 11780 11794 11852 11888 11923 11924 11943 11963 11975 1201812062 12144 12167 12190 12247 12283 12319 12365 12382 12399 12420 1245012480 12531 12561 12591 12591 12611 12631 12644 12661 12678 12679 1268912700 12712 12730 12749 12818 12856 12895 12940 12957 12974 12964 1298012995 13041 13075 13109 13111 13128 13144 13144 13162 13180 13224 1323913254 13262 13274 13287 13299 13315 13331 13342 13356 13370 13371 1339013410 13517 13544 13572 13588 13607 13625 13692 13714 13735 13735 1375613777 13777 13794 13811 13821 13837 13854 13854 13875 13897 13897 1391013923 13923 13939 13954 13955 13977 14000 14010 14032 14055 14028 1405014071 14075 14089 14103 14127 14148 14169 14177 14197 14218 14220 1425014281 14366 14403 14439 14458 14483 14508 14514 14530 14546 14547 1456214577 14591 14623 14656 14746 14778 14811 14848 14878 14908 14948 1496914991 15271 15292 15313 15314 15347 15380 15518 15557 15596 15598 1563015663 15972 15989 16005 16018 16032 16047 16057 16084 16111 16150 1617516199 16442 16498 16555 16588 16661 16734 17004 17025 17046 17034 1705817081 17107 17123 17140 17140 17166 17191 17212 17273 17334 17337 1739117444 17445 17459 17474 17474 17504 17535 17544 17576 17608 17600 1763817676 17799 17889 17979 17981 18011 18041 18042 18072 18102 18103 1815118200 18228 18270 18311 18584 18621 18658 18659 18704 18749 18754 1878318813 18813 18841 18869 18870 18899 18928 19337 19484 19631 19888 1993519983 20868 20935 21002 21003 21064 21125 21126 21176 21226 21226 2128121337 21659 21699 21739

We claim:
 1. A method for predicting prognosis of a myelodysplasticsyndrome (MDS) patient comprising the steps of: (a) performing MALDI-TOFmass spectrometry on a blood-based sample obtained from the MDS patientby subjecting the sample to at least 100,000 laser shots and acquiringmass spectral data; (b) obtaining integrated intensity values in themass spectral data of a multitude of pre-determined mass-spectralfeatures; and (c) operating on the integrated intensity values of themass spectral data with a programmed computer implementing a classifier;wherein in the operating step the classifier compares the integratedintensity values with feature values of a training set of class-labeledmass spectral data obtained from a multitude of other MDS patients withthe values obtained in step (b) with a classification algorithm andgenerates a class label for the sample, wherein the class label isassociated with a prognosis of the MDS patient.
 2. The method of claim1, wherein the classifier is configured as a combination of filteredmini-classifiers using a regularized combination method.
 3. The methodof claim 1, wherein the obtaining step (b) comprises obtainingintegrated intensity values of at least 50 features listed in AppendixA.
 4. The method of claim 3, wherein the obtaining step comprisesobtaining integrated intensity values of at least 100 features listed inAppendix A.
 5. The method of claim 3, wherein the obtaining stepcomprises obtaining integrated intensity values of at least 300 featureslisted in Appendix A.
 6. A classifier for predicting the prognosis of aMDS patient, comprising in combination: a memory storing a reference setof mass spectral data obtained from blood-based samples of a multitudeof MDS patients; a programmed computer configured to implement aclassifier configured as a combination of filtered mini-classifiers withdrop-out regularization; wherein the reference set of mass spectral dataincludes feature values of at least some of the m/z features listed inAppendix A.
 7. The classifier of claim 6, wherein the reference set ofmass spectral data includes feature values of at least 50 featureslisted in Appendix A.
 8. The classifier of claim 6, wherein thereference set of mass spectral data includes feature values of at least100 features listed in Appendix A.
 9. The classifier of claim 6, whereinthe reference set of mass spectral data includes feature values of atleast 300 features listed in Appendix A.
 10. A laboratory testing systemfor conducting tests on blood-based samples from MDS patients andpredicting the prognosis of the MDS patients, comprising: a MALDI-TOFmass spectrometer configured to conduct mass spectrometry on ablood-based sample from a patient by subjecting the sample to at least100,000 laser shots and acquire resulting mass spectral data; a memorystoring a reference set of mass spectral data obtained from blood-basedsamples of a multitude of MDS patients and associated class labels; anda programmed computer configured to implement a classifier operating onthe reference set and the resulting mass spectral data obtained from theblood-based sample from the patient; wherein the reference set of massspectral data includes feature values of at least some of the m/zfeatures listed in Appendix A; and wherein the programmed computer isprogrammed to generate a class label for the sample as an output of theclassifier, wherein the class label is associated with a prognosis ofthe MDS patient.
 11. The system of claim 10, wherein the classifier isconfigured as a combination of filtered mini-classifiers with drop-outregularization.